Hardware/Software Coherence in Hybrid Memory Models
نویسندگان
چکیده
Current cache coherence protocols limit the scalability of chip multiprocessor (CMP) architectures. The expected increase of the number of cores in next generation CMPs call for an evolution of the memory subsystem. One solution is to introduce a local memory side to the cache hierarchy, forming a hybrid memory model. On the one hand, local memories are more power-efficient than caches and they don’t generate coherence traffic. On the other hand, local memories suffer from poor programmability, so programmers rely on automatic code transformations to operate them. When non-predictable memory access patterns are found compilers do not succeed in generating code that manages the local memory because they require complex memory aliasing analyses. This is caused by the incoherency between the local memory and the cache hierarchy. This paper proposes a coherence protocol for hybrid memory models that allows the compiler to generate code even in the presence of aliasing problems. Coherency is ensured by a simple software/hardware co-design that identifies potentially incoherent memory accesses and diverts them to the correct copy of the data. The coherence protocol doesn’t maintain two coherent copies of the data, so no coherence traffic is generated and the overhead is negligible, 0.2% on average. When compared to traditional cache-based architectures, the hybrid memory model with the proposed coherence protocol achieves an average speedup of 1.5x.
منابع مشابه
An Argument for Simple COMA
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture that incorporates the major strengths of several contemporary multiprocessor architectures while avoiding their most serious weaknesses. Speciically, our architecture design incorporates the automatic data migration and replication features of cache-only memory architectu...
متن کاملAdaptive Coherence Batching for Trap-Based Memory Architectures
Both software-initiated and hardware-initiated prefetching have been used to accelerate shared-memory server performance. While software-initiated prefetching require instruction set and compiler support, hardware prefetching often require additional hardware structures or extra memory state. The coherence batching scheme proposed in this paper keeps the system completely binary transparent and...
متن کاملSoftware Cache Coherence for Large Scale Multiprocessors
Shared memory is an appealing abstraction for parallel programming. It must be implemented with caches in order to perform well, however, and caches require a coherence mechanism to ensure that processors reference current data. Hardware coherence mechanisms for large-scale machines are complex and costly, but existing software mechanisms for message-passing machines have not provided a perform...
متن کاملSpecification-based Verification in a Distributed Shared Memory Simulation Model
The emergence of chip multiprocessors is leading to rapid advances in hardware and software systems to provide distributed shared memory (DSM) programming models, so-called DSM systems. A DSM system provides programming advantages within a scalable and cost-effective hardware solution. This benefit derives from the fact that a DSM system creates a shared-memory abstraction on top of a distribut...
متن کاملHigh Performance Software Coherence for Current and Future Architectures
Shared memory provides an attractive and intuitive programming model for large-scale parallel computing, but requires a coherence mechanism to allow caching for performance while ensuring that processors do not use stale data in their computation. Implementation options range from distributed shared memory emulations on networks of workstations to tightly-coupled fully cachecoherent distributed...
متن کامل